A Theoretical Study of Extreme Parallelism in Sequence Alignments
نویسنده
چکیده
In this paper, we describe fully parallelized architectures for one-to-one, one-to-many, and many-to-many sequence alignments using Smith-Waterman algorithm. The architectures utilize the principles of parallelism and pipelining to the greatest extent in order to take advantage of both intrasequence and inter-sequence parallelization and to achieve high speed and throughput. First, we describe a parallelized Smith-Waterman algorithm for general single instruction, multiple data (SIMD) computers. The algorithm has an execution time of O(m+n), where m and n are the lengths of the two biological sequences to be aligned. Next, we propose a very-large-scale integration (VLSI) implementation of the parallel algorithm. Thirdly, we incorporate a pipelined architecture into the proposed VLSI circuit, producing a pipelined processor that can align a query sequence with a database of sequences at the speed of O(m+n+L), where m is the length of the query sequence and n and L are the maximum length and the number of sequences in the database, respectively. Finally, we make use of our pipeline architecture to perform all possible pairs of pair-wise alignments for a group of L sequences with a maximum sequence length of m in O(mL) time. Checking all pairs of pair-wise alignments is essential to the overlap-layout-consensus (OLC) approach for de novo assembly.
منابع مشابه
A Study of the Role of Repetition of Music in the Nimayee Poems of Akhavan Sales
In this article, the aim of the authors is to study the role of repetition in creating music and parallelism in Nimayee poems in the three collections of poems by Mehdi Akhavan Sales: Zamestan, Akhere Shahname and Az Avesta. Accordingly, the researchers have investigated the various manners of repetition in creating parallelism in the poems at three levels: phonological parallelism, lexical par...
متن کاملA direct method for computing extreme value (Gumbel) parameters for gapped biological sequence alignments
We develop a general method for computing extreme value distribution (Gumbel, 1958) parameters for gapped alignments. Our approach uses mixture distribution theory to obtain associated BLOSUM matrices for gapped alignments, which in turn are used for determining significance of gapped alignment scores for pairs of biological sequences. We compare our results with parameters already obtained in ...
متن کاملAccurate formula for P-values of gapped local sequence and profile alignments.
A simple general approximation for the distribution of gapped local alignment scores is presented, suitable for assessing significance of comparisons between two protein sequences or a sequence and a profile. The approximation takes account of the scoring scheme (i.e. gap penalty and substitution matrix or profile), sequence composition and length. Use of this formula means it is unnecessary to...
متن کاملA Comparative Study in Relation to the Translation of the Linguistic Humor
Mark Twain made use of repetition and parallelism as two comedic literary devices to bring comic effect to the readers. Linguistic devices of humor, repetition and parallelism seemed to create many difficulties in the translation of literary texts. The present study applied Delabatista‟s strategies for translating wordplays such as repetition and parallelism in the translation of humorous texts...
متن کاملFundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics
MOTIVATION Different automatic methods of sequence alignments are routinely used as a starting point for homology searches and function inference. Confidence in an alignment probability is one of the major fundamentals of massive automatic genome-scale pairwise comparisons, for clustering of putative orthologs and paralogs, sequenced genome annotation or multiple-genomic tree constructions. Ext...
متن کامل